Learning with Class Skews and Small Disjuncts
نویسندگان
چکیده
One of the main objectives of a Machine Learning – ML – system is to induce a classifier that minimizes classification errors. Two relevant topics in ML are the understanding of which domain characteristics and inducer limitations might cause an increase in misclassification. In this sense, this work analyzes two important issues that might influence the performance of ML systems: class imbalance and errorprone small disjuncts. Our main objective is to investigate how these two important aspects are related to each other. Aiming at overcoming both problems we analyzed the behavior of two over-sampling methods we have proposed, namely Smote + Tomek links and Smote + ENN. Our results suggest that these methods are effective for dealing with class imbalance and, in some cases, might help in ruling out some undesirable disjuncts. However, in some cases a simpler method, Random over-sampling, provides compatible results requiring less computational resources.
منابع مشابه
Learning with Rare Cases and Small Disjuncts
Systems that learn from examples often create a disjunctive concept definition. Small disjuncts are those disjuncts which cover only a few training examples. The problem with small disjuncts is that they are more error prone than large disjuncts. This paper investigates the reasons why small disjuncts are more error prone than large disjuncts. It shows that when there are rare cases within a do...
متن کاملThe Impact of Small Disjuncts on Classifier Learning
Many classifier induction systems express the induced classifier in terms of a disjunctive description. Small disjuncts are those disjuncts that classify few training examples. These disjuncts are interesting because they are known to have a much higher error rate than large disjuncts and are responsible for many, if not most, of all classification errors. Previous research has investigated thi...
متن کاملReducing the Small Disjuncts Problem by Learning Probabilistic Concept Descriptions 10.1 Introduction
Concept learners that learn concept descriptions consisting of rules have been shown to be prone to the small disjuncts problem (Holte et al., 1989). This is the problem where a large proportion of the overall classi cation error made by the concept description on an independent test set can be attributed to rules which were true for a small number of training examples. In noisy domains, such c...
متن کاملDiversifying Support Vector Machines for Boosting using Kernel Perturbation: Applications to Class Imbalance and Small Disjuncts
The diversification (generating slightly varying separating discriminators) of Support Vector Machines (SVMs) for boosting has proven to be a challenge due to the strong learning nature of SVMs. Based on the insight that perturbing the SVM kernel may help in diversifying SVMs, we propose two kernel perturbation based boosting schemes where the kernel is modified in each round so as to increase ...
متن کاملA Quantitative Study of Small Disjuncts
Systems that learn from examples often express the learned concept in the form of a disjunctive description. Disjuncts that correctly classify few training examples are known as small disjuncts and are interesting to machine learning researchers because they have a much higher error rate than large disjuncts. Previous research has investigated this phenomenon by performing ad hoc analyses of a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004